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Abstract 

Compressed sensing deals with the reconstruction of sparse signals using a small number of linear 
measurements. One of the main challenges in compressed sensing is to find the support of a sparse 
signal. In the literature, several bounds on the scaling law of the number of measurements for successful 
support recovery have been derived where the main focus is on random Gaussian measurement matrices. 

In this paper, we investigate the noisy support recovery problem from an estimation theoretic point 
of view, where no specific assumption is made on the underlying measurement matrix. The linear 
measurements are perturbed by additive white Gaussian noise. We define the output of a support estimator 
to be a set of position values in increasing order. We set the error between the true and estimated supports 
as the ^2-norm of their difference. On the one hand, this choice allows us to use the machinery behind 
the ^2-norm error metric and on the other hand, converts the support recovery into a more intuitive 
and geometrical problem. First, by using the Hammersley-Chapman-Robbins (HCR) bound, we derive a 
fundamental lower bound on the performance of any unbiased estimator of the support set. This lower 
bound provides us with necessary conditions on the number of measurements for reliable ^2-norm support 
recovery, which we specifically evaluate for uniform Gaussian measurement matrices. Then, we analyze 
the maximum likelihood estimator and derive conditions under which the HCR bound is achievable. 
This leads us to the number of measurements for the optimum decoder which is sufficient for reliable 
^2-norm support recovery and shows that the performance of the optimum decoder has only a 9 dB gap 
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compared to the HCR lower bound. Using this framework, we specifically evaluate sufficient conditions 
on the number of measurements for uniform Gaussian measurement matrices. 

Index Terms 

Compressed sensing, compressive sampling, support recovery, Hammersley-Chapman-Robbins bound, 
Cramer-Rao bound, unbiased estimator, maximum-likelihood estimator 

I. Introduction 

Linear sampling of sparse signals, with the number of measurements close to their sparsity level, has 
recently received a lot of attention under the names of compressed sensing (CS), compressive sampling 
or sparse sampling [2]-[5]. A fc-sparse signal 6 G W is defined as a signal with fc<Cp nonzero expansion 
coefficients in some orthonormal basis or frame. The goal of compressed sensing is to find measurement 
matrices <& mxp , followed by reconstruction algorithms which allow robust recovery of sparse signals using 
the least number of measurements m, and low computational complexity; see for example [6]-[ll]. 

Support recovery refers to the problem of estimating the positions of the non-zero entries of 6, based 
on a set of observations. In the noiseless setting, the optimal algorithm requires m = k + 1 samples at the 
expense of high computational complexity to obtain the true support set [12] while m = 0(k\og (p/k)) 
measurements are needed for the reconstruction algorithms based on linear programming [13]. In the 
same context, it is shown that m = 2k + 1 samples are sufficient for shift-invariant measurement matrices 
using recovery algorithms based on annihilating filters [14]. 

In practice, however, all the measurements are noisy due to physical restrictions, quantization precision, 
etc. A large body of recent work has established bounds on the number of measurements required for 
successful support set recovery in the noisy setting. Denoting 9 mm as the minimum non-zero coefficient of 
the sparse vector 6, the authors in [15], [16] derived the scaling law on the number of measurements as a 
function of (p, k, # mm ) for the l\ -constrained quadratic programming, also referred to as Lasso, to recover 
the sparsity pattern. In the context of the optimal decoding algorithm, the results in [17], [18] provide 
necessary and sufficient conditions for the perfect support recovery under the Gaussian measurement 
ensemble. Considering a fractional support recovery, the study in [19] provides a set of necessary and 
sufficient conditions on the required number of measurements as a function of the fraction of the support 
that can be reliably recovered. 

In this paper, we look at the support recovery problem from an estimation theoretic point of view, where 
the error metric between the true and the estimated support is the ^-norm of their difference. In some 
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applications, e.g. [20], it is important that the recovered sparsity pattern be as close as possible to the true 
support set. In these cases, the ^2-norm error metric comes as an appropriate option where the assigned 
penalty is quadratically proportional to the distance. Moreover, this choice allows us to use the machinery 
behind the ^-norm error metric, which makes the theorems and the proofs geometrical and more intuitive. 
While no specific assumption is made on the underlying measurement matrix, we assume that the linear 
measurements are perturbed by additive white Gaussian noise. Since the positions of the nonzero entries 
of forms a set of k discrete values (e.g., integers between 1 and p), the support recovery problem can 
be regarded as estimating restricted parameters. This leads us to use the Hammersley-Chapman-Robbins 
(HCR) bound which provides a lower bound on the variance of any unbiased estimator of a set of 
restricted parameters [21], [22]. The HCR bound is a generalization of the Cramer-Rao (CR) bound [23] 
and holds under much weaker regularity conditions, while giving substantially tighter bounds in general. 
Using the HCR bound, we specifically derive in a straightforward manner the necessary conditions on 
the required number of measurements for the standard Gaussian ensemble. 

Of equal interest are the conditions under which the HCR bound is achievable (tight). To this end, 
we study the performance of the maximum likelihood estimator (MLE) and derive conditions under 
which it becomes unbiased and achieves the HCR bound. In particular, this leads us to the sufficient 
conditions on the number of measurements for reliable i^-norm support recovery using the standard 
Gaussian measurement ensemble. Note that when the error of the ^2-norm support recovery vanishes, 
so does that of a regular support recovery problem with the {0, 1} error metric. Therefore, the derived 
sufficient condition also applies to the {0, 1} error metric support recovery. 

The organization of the paper is as follows. In Section JIIJ we provide a more precise formulation of 
the problem. We derive the HCR bound for the support recovery problem in Section JV] which is followed 
by deriving necessary conditions on the number of measurements for the standard Gaussian measurement 
ensemble. By studying the performance of the MLE in Section [V] we derive conditions under which 
the HCR bound becomes achievable. Finally, under the standard Gaussian measurement ensemble, we 
identify the sufficient number of measurements for reliable ^-norm support recovery. 

II. Previous Work 

The problem of sparsity recovery has received considerable attention in the literature in both the 
noiseless and noisy settings, see e.g., [7], [15], [17]— [19], [24], [25]. The results focus on the asymptotic 
scaling of the number of measurements for almost-sure success of the reconstruction of sparse inputs. In 
this section, we give an overview of the previous work which is more related to the results of this paper. 
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The work in [17] provides necessary and sufficient conditions on the number of measurements in 
the high-dimensional and noisy setting for reliable sparsity recovery using an optimal decoder. In that 
setup, the measurements are contaminated by i.i.d. Gaussian noise and the analysis is high dimensional, 
meaning that the sparsity level k, the signal dimension p and the number of measurements m tend 
to infinity simultaneously. Under the condition (m — k) # min — ► +00, the author derives the following 
sufficient condition for asymptotic reliable recovery of the optimal decoder 

m > C max < /clog (p/k), log (p — k) 1 , (1) 

I. min ) 

where C > is a fixed constant. Moreover, it is also shown in [17] that 



a , p 

k' 



m > tf- l °ZTi 



is a necessary condition for some fixed constant C > 0. By simplifying the sufficient condition (OQ) 
in the sublinear sparsity regime k = o(p), it is shown that the number of measurements required by 
the l\ constrained quadratic programming (Lasso) given by m = Q(k log (p — k)) [15] achieves the 
information-theoretic necessary bound. 

In [18], the authors derive the necessary scaling 

m> MAR 2 SNR Hog(p " A:) + fc " 1 ' (2) 
for uniform i.i.d. Gaussian measurement ensemble which is true at any finite SNR and for all algorithms. 
The term MAR indicates the minimum-to-average ratio of the input sparse signal. Moreover, they show 
that for a fixed SNR and MAR, the simple maximum correlation estimator (MCE) achieves the same 
scaling as in (fSJ). The MCE selects the indices of the k columns of the measurement matrix having the 
highest correlation with the measurement vector. More precisely, the results indicate that MCE needs 

8(1 + SNR) , , , 

measurements to succeed with high probability. Therefore, the simple MCE also achieves the same scaling 
law as Lasso. 

In a more general setting, the support recovery with some distortion measure has been considered 
in [9], [19], [26]. The results in [19] show that if the SNR does not increase with the signal dimension, 
the exact support recovery is not possible. Moreover, they show that partial support recovery is possible 
with a bounded SNR per sample which indicates that a finite rate per sample is sufficient. In this regard, 
our work can be viewed as the support recovery problem with the ^-norm distortion measure. In the 
following, we explain our setup for the estimation theoretic approach of support recovery. 
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III. Problem statement 

In this paper, we consider a deterministic signal model, in which G W is a fixed but unknown vector 
with exactly k non-zero entries. We refer to k as the signal sparsity, p as the signal dimension, and define 
the support vector s(0) as the positions of the non-zero elements of 0. More precisely, 

s(0) = (ni,n 2 ,...,n fc ), 

where we assume that n\ < ri2 < ■ ■ ■ < n^. The corresponding non-zero entries of form a vector 

@s = {@ni j $n 2 i ■ ■ ■ i ®n k ) • 

Suppose we are given a vector of m noisy observations y G W 71 of the form 

y = + e, 

where <1> G M. mxp is the measurement matrix and e ~ (0, cr 2 I mxm ) is additive i.i.d. Gaussian noise. 
Throughout this paper, we assume that a 2 is fixed; since any scaling of a 2 can be accounted for in the 
scaling of 0. Let x = <&6, <& s denote the matrix composed of the columns of * at positions indexed 
by 8(0), and 5{$ s } denote the column span of <1> S . Since there are N = (|) subspaces of dimension 
k, a number from 1 to N can be assigned to them and w.l.o.g., we assume that x belongs to the first 
subspace 5{<l> Sl }. From now on, for simplicity we refer to the first subspace as 5{<1> S }. Moreover, we 
need to assume that any 2k columns of the measurement matrix $ are linearly independent. Under this 
assumption, we have / 0' x / x', i.e., there is a one-to-one correspondence between k sparse 
vectors and their images x. 

Due to the presence of noise, cannot be recovered exactly. However, a sparse-recovery algorithm 
outputs an estimate 0'. In the support recovery problem, we are only interested in estimating the support. 
To that end, we can consider different performance metrics for the quality of estimation. In [15], the 
measure of error between the estimate and the true signal is a {0, 1}— valued loss function, 

Pl (s,s')±l(s^s'), 

where I(-) is the indicator function. This metric is appropriate for the exact support recovery. In this work, 
we are interested in an approximate support recovery where the goal is to recover a sparsity pattern as 
close as possible to the true support set. For this purpose, we consider the following £2 -norm error metric 

p 2 (s,s>)±\\s-s>\\ 2 , 

where throughout this paper, || • || refers to the Euclidean norm. Note that p2(s, s f ) = 44> pi(s, s r ) = 0. 



November 25, 2009 



DRAFT 



6 



As is mentioned in [17], SNR alone is not a suitable quantity for the support recovery problem. 
It is possible to generate a set of problem instances for which the support recovery becomes arbitrarily 
unreliable, in particular, by letting the smallest coefficient go to zero (assuming that k > 1) at an arbitrarily 
rate, even though the SNR becomes arbitrarily large by increasing the rest. As he also observed, the 
magnitude of the smallest nonzero entry of is prominent in the phrasing of results. Hence, we define 

6 min = min|0j|. 

In particular, our results apply to any unbiased estimator that operates over the signal class 

c(e mhl ) = {6gW: \9i\ > e mhl v* e 8(e)}. 

Our analysis is high dimensional in nature, in the sense that the signal dimension p goes to infinity. More 
precisely, we say the ^-norm support recovery is reliable if 

lim p 2 (s(0),s(0)) = 0, (4) 

p— >oo 

for any G C(8 m - m ) under some scaling of {6 m - m , k, m} as a function of p, where a(0) is the estimated 
support of 0. For unbiased estimators, (@]) is equivalent to 

lim tr [cov(s(0))] = 0, 

p^oo 

where 

cov(l(0)) = E [(8(0) - E[ S (0)]) T (8(0) - E[s(e)})] , 

and tr[-] is the matrix trace operation. Since the support estimation is based on y, with a slight abuse of 
notation, we also denote it by s(y). 

With this setup, our first goal is to find necessary conditions on parameters {p, m, k, 6 m - m } which should 
be satisfied by any unbiased estimator for reliable ^-norm support recovery. The results are applicable to 
any measurement matrix and we specifically evaluate it for the standard Gaussian measurement matrices. 
Our second goal is to find sufficient conditions for the successful support recovery using the optimum 
decoder. We show that under appropriate conditions, the performance of the optimum decoder is close to 
the theoretical lower bound for the performance of the unbiased support estimators. Again, as a special 
case, we evaluate the sufficient conditions for standard Gaussian measurement matrices. 
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IV. Hammersley-Chapman-Robbins Bound 

The Cramer-Rao (CR) bound is a well-known tool in statistics which provides a lower bound on the 
variance of the error of any unbiased estimator of an unknown deterministic parameter 5 from a set of 
measurements y [23]. More specifically, in a single parameter scenario, the estimated value 5 satisfies 

var(<5) ^ SI p/^ > (5) 

where P(y; 5) is the pdf of the measurements which depends on the parameter 5. As Q suggests, the 
CR bound is derived for estimating a continuous parameter. 

In many cases, there is a priori information on the estimated parameter which restricts it to take values 
from a predetermined set. An example is the estimation of the mean of a normal distribution when one 
knows that the true mean is an integer (see the example below). In such scenarios, the Hammersley- 
Chapman-Robbins (HCR) bound provides a stronger lower bound on the variance of any unbiased 
estimator [21], [22]. More precisely, let us assume that the set of observations y = (2/1,2/2, ■ ■ • , Dm) 
are drawn according to a probability distribution with density function P(y; 8) where 8 is a parameter 
belonging to some parameter set A (e.g., the set of integer numbers) and completely characterizes the 
pdf. In addition, the sequence 8 is partitioned into two subsequences 8 = (81,82), where we are only 
interested in estimating the parameters included in the subsequence 6±. Let 8i(y) denote an unbiased 
estimator of Si. Given the above definitions, we recall the following result. 

Theorem 1 ([21], [22]): The trace of the covariance matrix of any unbiased estimator of 8\ is bounded 
below by 

||5 _ §' II 2 

tr[cov(3i)] > sup ^ { Sl) \ -, (6) 

in which 8' = (8[, 8' 2 ) € A. The set A is chosen so that 8' takes values according to the a priori 
information. 

Example 1: For clarity, let us consider the performance of any unbiased estimator of (only) the mean 
of a normal distribution based on independent samples of size m, i.e., y = (yi, ■ ■ ■ , Um)- In this case, 
8 = (/U, a 2 ), 5i = /x, 62 = cr 2 and 

F(y;8) = (_J=)- e -^ T.1L^Y . 

Let fi(y) denote an unbiased estimator of /i which is the parameter we want to estimate. When there is 
no prior information on \i, it follows from the CR bound that 

var(/i) > a 2 /m. (7) 



November 25, 2009 



DRAFT 



s 



Once the mean is restricted to be an integer, we may write <5i = \i and 5[ = jjL + a, where a is a non-zero 
integer. Then, upon integration in © we get 

. s Q 2 

var(rt > maj_ 57 - F _ (8) 

(9) 



gm/cr 2 -y ' 

where the maximum is attained for a = ±1. A point worth mentioning is the role of the prior information. 
While © drops linearly, (O decreases exponentially with respect to the number of observations. It is 
also interesting to note that ([8]) applies as well to the case in which the parameter is not restricted. We 
then have to deal with the maximization in ([8]) for variations in a, where a may take any value (not 
necessarily integer) except a = 0. Since the right hand side of © is a decreasing function of a, we let 
a and get ©. 

A. Performance Lower Bound 

In the support recovery problem, we know a priori that each entry of the support vector takes values 
from the restricted set A = {1, 2, . . . ,p}. Hence, the HCR bound can provide us with a lower bound on 
the performance of any unbiased estimator of the support set. 

Theorem 2: Assume s(y) to be an unbiased estimator of the support s. The HCR lower bound on the 
variance of s(y) is given by 

lis — s ■ 1 1 2 

tr[cov(s)l > max —, , (10) 

- ie{2,-,N} e W x -P°i x \\ l° 2 - l 

in which p Si x denotes the projection of x onto 5{<I> S! }. 

Proof: Since our observations are of the form y = <&6 + e, the set of unknown parameters S consists 

of the support vector s(0) = (m, ri2, • • • , n^) and the corresponding coefficients 6 S = (6 ni , 9 n2 , . . . , nk ). 

We are only interested in estimating the support, hence, S\ = s(0) and 82 = S - Then 

F 2 (y;S') -pr 1 (y i -2 1 ;+»,) 2 -2( ; .;- Ii ) 2 
= I I — — — e 2^ . 

where x' = Upon integration we get 

P 2 (y;S') 



F(y;S) 



dy 



Using the HCR bound ©, we derive 



lis - s'\\ 2 

tr[cov(s)] > sup llx _ x ,u 2 , 2 r- (11) 
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If x and x' live in the same subspace, i.e., s = s', the right hand side of (fTTb will be zero. Therefore, 
in order to find the supremum, we can restrict our attention to all the signals which do not live in the 
same subspace as x does: 

™,> SUP Bgyr^ . <.2, 

For each sequence s' ', the numerator of (TT2l) is fixed (it is the ^2 distance between the supports and does 
not depend on the coefficients) while the denominator is minimized by setting x' = p s >x. This leads 
to C[0]>. ■ 

Corollary 1: For any support vector Sj / s, we have 



tr[cov(s)] > 



Is - S,:|| 2 



e \\x-p s .x\\ 2 /(T 2 _ ^ ' 

In the following, we see how Theorem [2] helps us to find the lower bound on the number of measure- 
ments for reliable ^2-norm support recovery. 

B. Necessary Conditions 

Using the HCR bound, Theorem[2]provides a lower bound on the performance of any unbiased estimator 
for the ^2-norm support recovery problem. In words, the ^2-norm support recovery is unreliable if the right 
hand side of (fTOb is bounded away from zero, which yields to a lower bound on the minimum number of 
measurements. However, finding the maximum in (fTOb requires a search through an exponential number 
of subspaces. Instead, as Corollary Q] suggests, any subspace different from the true one will provide 
us with a lower bound. In the following, we show how this result will lead to necessary conditions for 
random Gaussian measurement matrices. 

Theorem 3: Let the measurement matrix 3> G M mx P be drawn with i.i.d. elements from a standard 
Gaussian distribution M (0, 1). The ^2-norm support recovery over the signal class C(0 m i n ) is unreliable 
if 

/ CT 2 log(p-/c) 

m < max < k, ^ 

v ^min 

Proof: The £2 -norm support recovery is reliable if (|4]) holds for any G C{6 m \ n ). Consider a 6 with 
s{0) = (1, 2, . . . , k) which takes 9 m \ n as its last non-zero entry, i.e., 9k = m \ n . From Corollary [Q we 
have 

for any x' = G & s >. In particular, let 0' have the support s(6') = (1, 2, . . . , k — l,p) with coefficients 
equal to those of in the first k — 1 positions and 9' p = 6* m j n . We show that if m does not satisfy the 

November 25, 2009 DRAFT 



10 



condition of the theorem, then the RHS of (fT3T > will be bounded away from zero for this specific 6' , and 
therefore the estimation is unreliable. 

Note that ||s - s'|| 2 = (p — k) 2 , and x — x' = <&(6 - 6'). This implies that 

I J, II "minll/F, a 1 1 2 _ ^"min 



where Z ~ X 2 ( m ) has a chi-square distribution with m degrees of freedom. It is known that a central 
chi-square random variable with m degrees of freedom satisfies 

Pr (Z - m > 2Vmtj < e~\ (14) 

for all t > [27]. Assume that 

m<{l-c / l °f- k \ (15) 

min 

for some constant C > 0, and evaluate (fT4l) for t = lo 0- p ~ k ^ — /Am. This leads to 

2 - 



Pv{Z> - 2 ) < exp 



mm 



o- 2 log(p-fc) 

m 

Am 



(16) 



Note that the RHS of (1161 ) converges to zero, as p grows. Therefore, 

/ \\x — x'\\ 2 , ,n\ ( a 2 log(v — k) 
Pr ( ^ < log (p — k) J = Pi yZ < 

V ® / \ min 

which shows that the RHS of (fT"3l) is bounded away from zero with high probability, and therefore, the 
estimation error does not vanish asymptotically. 

■ 

Table U shows the necessary conditions for different scalings of k and 9 m i n as a function of p. 

Up to this point, we have discussed the HCR bound and its application in finding necessary conditions 
on the number of measurements for reliable ^-norm support recovery for Gaussian measurement matrices. 
In the following, we find conditions under which the HCR bound is achievable and as a result, find the 
sufficient number of measurements for reliable £2 -norm support recovery. 

V. ACHIEVABILITY OF THE HCR BOUND 

We now analyze the performance of the maximum likelihood estimator (MLE) for the £2 -norm support 
recovery and find conditions under which it becomes unbiased and in addition, its performance moves 
towards that of the HCR bound. We then apply this result to derive a sufficient number of measurements 
for the standard Gaussian measurement matrices. 
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A. MLE performance 

Provided that any 2k columns of the measurement matrix 3> are linearly independent, the noiseless 
measurement vector x = $6 belongs to one and only one of the N possible subspaces. Since the noise 
e E W 71 is i.i.d. Gaussian, MLE selects the subspace closest to the observed vector y £ R m . More 
precisely, 

s ML = argmin \\y - p s y\\. 

s:\s\=k 

Now consider another subspace 5{$ S '} of dimension k where s ^ s'. Clearly an error happens when 
MLE selects the support s' in place of the true support s. Let Pr ML (s') denote the probability that MLE 
outputs the support vector s' instead of s, among all possible support vectors. 

Lemma 1: Let y = x + e, where x = $>6 £ 5{3> s }, e ~ M(0, o~ 2 I) and s' be a support set different 
from s. Then 

Pr(s') < Pr ( \\e\\ > ^ x ~ Ps ' x \ 



ML V 2 

Proof: See Appendix IA-AI ■ 
Let the minimum distance between x and its projections onto other subspaces be 

d min = min \\x - p s ,x\\, 

s':s'^s 

and the distinguishability factor (3 be defined as 

Lemma 2: Let y = x + e, where x = $6 E 5{3> s } and e ~ AA(0, a 2 I). Moreover, assume that the 
number of measurements m is an even integer, and j3 > 1. Then, the probability that MLE makes an 
error in choosing s is upper bounded by 

Pr(err) < — c(/3)" /3m , 

ML 2 

where c(/3) = e {P-^)/W / fr/W > i an d c (^) — , ^ as p grows . 

Proof: See Appendix IA-B1 ■ 
Based on Lemma the probability of error of MLE is related to the minimum distance between 
x and its projections onto the other subspaces. In the following theorem, we provide a bound on the 
performance of MLE. 

Theorem 4: Let (3 > 1 and m > (1 + e) log (p) / ((3 log c((3)) for some fixed e > 0. Then, MLE is 
asymptotically unbiased as p — > oo, namely, 

lim E(s) = s. 

p^oo 
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Moreover, its performance is bounded by 

tr[cov(s)]<^c(/5)^ m , (17) 

ML Z 

in which c(/3) = e {P-±)/W / ft-IW > 1 and c(/3) — > yfe as (3 grows. 

Proof: Let s = (nx,fi2, ■ ■ ■ , Uk) be the ML estimate for the true support set s = (nx,n,2, ■ ■ ■ , n/-). 
Then 

N 



8=1 

= sPr(s)+ VsiPr(si). 

ML — ' ML 

Si^S 

Since X] Si ^ s Pi'ml^i) = Pr ML (err) and 1 < hi < p, we have 



V ^ PrOi) <(p,p,...,p) Pr(err). (18) 

' ■ MI. Ml. 



ML 



in which Pr ML (err) denotes the probability that MLE makes an error. Combining (fT8t and Lemma we 
get 

lim V s t Pr( a< ) < lim (p,p, . . . ,p)^ c(f3)^ m ( = } 0, 

p— >oo ' ML p— >oo z 

where in (a), we used m > (1 + e) log (p) / ((3 log c(/3)). Obviously, Pr ML (s) — > 1 as p — > cxd. Hence 
lim„^ 00 E(s) = s. For the second part, we need to compute the asymptotic behavior of tr[cov(s)] as 

ML 

p — > cxd. By definition 

tr[cov(s)] = E(||s-E(s)|| 2 ). 



Now, as p — > cxd we can write 



tr[cov(s)] =^Pr(s i )||s i -E(s M| - 



Sj 



(a) . 

< kp > Pr(sj) 

t-^ ML 

Si^S 



where in (a) we used the fact that [|sj — E(s)|| 2 is bounded by kp 2 and for (6) we used Lemma|2] ■ 
By Theorem @1 MLE is asymptotically unbiased and therefore, its estimation error is lower bounded 
by the HCR bound. Moreover, the MLE performance upper bound in (fTTl) has only a 9 dB gap in the 
denominator compared to the HCR lower bound in (TTOb . Therefore, such asymptotic behavior of MLE 
shows the achievability of the HCR bound, under the mentioned conditions. 
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As we observe, our results do not depend on any specific measurement matrix. In the following, we 
see how these results lead us to find the sufficient number of measurements for reliable ^2-norm support 
recovery when the Gaussian measurement ensemble is used. 

B. Sufficient Conditions 

Theorem |4] provides us with a bound on the performance of the MLE. For reliable £ 2 -norm support 
recovery, the right hand side of (ITTb should go to zero as p — > oo. To that end, as required by Theorem @] 
one should make sure that first, j3 is bounded away from one which is a property of the underlying 
measurement matrix and second, that the number of measurements is at least of the order of log p. Note 
that these conditions also imply that MLE is asymptotically unbiased and therefore, its performance is 
bounded by the HCR bound. 

In the following, we study the above two conditions for random Gaussian measurement matrices, which 
will provide us with the sufficient number of measurements for reliable ^-norm support recovery. 

Theorem 5: Let the measurement matrix $ be drawn with i.i.d. elements from the standard Gaussian 
distribution A/"(0, 1). If the minimum coefficient value of the signal satisfies > c for a constant c, 
then m = 0(fclog measurements suffice to ensure reliable ^-norm support recovery. 
Proof: To ensure that (3 > 1, we need to find the scaling for which 



where x = and s' goes through all support vectors different from s = si (i.e., from S2 to sjy)- We 
have, 



Since the projection operator P^r cancels out any vector which lives in the subspace tS{<J> s -}, we can 




(19) 



x - P s >x\\ 2 = \\Pj&0 



write 




P ,3> i ,0 i , 

- 1 s' a/a'^s/s' 



where s/s' denotes the elements of s which do not belong to s'. Now since 




and the range of the orthogonal projector P^f is of dimension m — k, we get 




(20) 



Let Aj denote the event {x : \\x — P Sj x\\ 2 > 4ma 2 }. Then, 
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Pr ( min lias - P s >x\\ 2 > 4ma 2 ) = Pr ( C] A< I 



Pr 



(a) 



iV 

J= 2 

N 

> l-^Pr(^), 

i=2 

where in (a) we used the union bound. In order to satisfy (fl9l ). we seek conditions under which the sum 
Pr(^l^) tends to zero. Each individual term in this sum can be written as 



Pr [AD = Pr X s , Sj < 



Ama 

\ d s/ S] \\ 2 



(21) 



Since X S Sj ~ ^{m — k) (see d20l)), we can apply the following large deviation bound for the centralized 
X 2 distributions [27] 

Pr (x 8ta . -{m-k)< -2^J{m-k)x^j < e~ x \ 
which is valid for all xj > 0. Now, define 

>m—k 2mcr 2 \2 



V- 



m — k 

and assume # min /c 2 > 8. Hence, due to the fact that 2k < m, we have 



2ma 



\0 



< 



2ma 2 1 



m — k 



s/s 3 



Therefore, by evaluating (1221) for xj in (1231) and using (1241 ) , we have 



(22) 



(23) 



(24) 



Pr [Afj < exp 



\/m — k 



2ma' L 



\ 2 l|0 s / s JV™- k / 

Let £j = \s/ Sj | be the number of indices in s not present in sj . Then 



\e 



s/ Sj I 



> 



■J u mirr 



Let the symbols Aj and y^, be defined as 



A, 



V4 



\/m — k 



\/m — k 



2ma 



\\8 s / Sj \\ 2 Vm- k 
2ma 2 



(25) 
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Then, Q^Jo 1 > 8 and ([25]) implies 

Aj > > 0. 

and therefore, 

exp(-A|)<e X p(-v|)- (26) 
Combining (1211 ) and (1261 ) and taking summation over all possible error events, we get 

N N 

E Pr K c )^E ex p(- A ') 

N 

<^exp(- V |.) 
i=2 



-^{C)C^*j exp( - v?) } 



As we mentioned earlier, the sum X^=2 P r (»4j) should tend to zero as the dimension p grows. This will 
hold if 

lim max \ log k + log ( ] + log ( ) - \j\ \ -> -00. (27) 

p->oo !<£<&; I \l I \ £ I 



Without loss of generality, we assume that a = 1. Let us define 

A Vj 

Oil = —r- 

m — k 

Applying (|24T >. it is easy to show that 

1 2m \ 2 1 



011 U ^(m-fc)J -4" 
Therefore using Stirling's approximation, (|27T ) is satisfied asymptotically if 



m> k + max i logfc + Hog ^ + Hog — - — )■ . (28) 



To find the maximum in (1281 ). we consider separately the linear and sub-linear regimes. 

1) £ = @(k): 
We have 

m > ci log + C2A; + C3A;log ^— — , 

k 

for some constants ci, 02,03 greater than zero. Since Hog 2 ^ dominates the other terms asymp- 
totically, we should have 

m = G(fclog — — ). 

k 
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2) l = o{k): 

In this regime we have 

llogj < k, 

and 

m P—k p- k 

ilog—^- < /clog— — . 

Therefore, the result of the linear regime covers the sub-linear regime. 
Thus, we showed that m = Q(k log ^r^) measurements is sufficient for perfect ^-norm support recovery 
under the standard Gaussian measurement ensemble. ■ 
Based on Theorem [5j the sufficient number of measurements under different scalings for k is given by 

k = 0(p) => m = @(p), 

P 

k = o(p) ==> m = Q(k log — ). 

k 

The necessary and sufficient conditions in different regimes for the standard Gaussian measurement 
ensemble are shown in Table U 

Remark: The first row in Table U shows that one needs to take more measurements than the dimension 
of the signal in order to estimate the exact support set. This seems to be in contradiction with the concept 
of compressed sensing. One might think that this is an artifact of using this particular way of sampling. 
To show that this is not the case, let us assume that we have direct access to the noisy version of the 
input signal 6. This means that we use a square diagonal matrix D instead of a Gaussian one to sample 
the signal. In order to make the two scenarios comparable, we should make sure that the signal powers 
after the measurement are equal. To this end, we need to put a gain of yk on the main diagonal. 

Now consider two signals Q\ and 62 which consist of k nonzero entries with amplitudes # m j n and differ 
in only one position. The probability of error of MLE is given by 



Pr(err) 

ML 




where Q(-) is the tail probability of a standard Gaussian random variable. In the regime considered in 
the first row of Table HI i.e., = © (i) we obtain 

Pr(err) = Q (constant) > 0. (29) 

ML 
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Necessary 


Sufficient 


k = 0(p) 

^min = (i) 


6 (plogp) 


■ 


fc = 0(p) 
= © (1) 


e(p) 


e(p) 


fc = o(p) 

^min = (i) 


e(feiog(p-/:)) 


■ 


fc = o(p) 

€in = © (1) 


max{e(fc),e(log(p-fc))} 


e(Hogf) 



TABLE I 

Necessary and sufficient conditions on the number of measurements required for reliable ^ 2 -norm 
support recovery under the standard gaussian measurement ensemble (a 2 = 1). 



Therefore, even if we use direct measurements, there is no hope to recover the exact support in this 
regime. In [17], Wainwright showed that Q(plogp) measurements is indeed sufficient. 

VI. Conclusions 

We considered the problem of recovering the support of a sparse vector from a set of noisy linear 
measurements from an estimation theoretic point of view. We set the error metric between the true and 
the estimated support sets as the .£2 -norm of their differences. Then, we investigated the fundamental 
performance limit of any unbiased estimator of the support set using the Hammersley-Chapman-Robbins 
bound, where no specific assumption was made on the measurement matrix. This general bound led 
us to the necessary conditions on the number of measurements for successful support recovery, which 
we specifically evaluated for standard random Gaussian measurement ensembles. Then, we analyzed 
the performance of the maximum likelihood estimator and derived conditions under which it becomes 
unbiased and achieves the Hammersley-Chapman-Robbins bound. Applying these conditions provided us 
with the sufficient number of measurements for random Gaussian measurement ensembles. 
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Appendix A 

A. Proof of Lemma [7] 

MLE chooses s' over s if and only if 



Let us assume that 



min \\y — t II < min lly — t\ 



lell < llX Ps ' X K (A.l) 



For any t' £ 3> s /, we have 

\\y - t'\\ 2 = \\x - t' + e|| 2 

> ||e|| 2 + \\x - t'\\ 2 - 2\\x - t'\\\\e\\ 

> ll e ll 

= \\y - x \\ 2 

> min lly — til 2 , 
tes 

where in (a) we used dA. II ). This implies that if ||e|| < \\x — p s /x\\/2, MLE will not choose s' over s. 
Since the probability that MLE selects s' among all possible support vectors is less than the probability 
that MLE chooses s' over s, we get 

Pr(a') <Prf ||e|| > W x ~ Ps ' x \ 



B. Proof of Lemma |2] 

From Lemma [T] we know that if ||e|| < dmin/2, MLE makes the correct choice. Therefore, 

Pr(err) <Pr(||e|| > d min /2) 

ML 

(M\ 2 

= 1 - Pr I iL ^- < r 

where r = (3m and is the distinguishability factor. The random variable is distributed according 
to the chi-square distribution with m degrees of freedom. By using the cumulative distribution function 
of the chi-square distribution, we obtain 

PrM<l-^, (A.2) 
ml r(m/2) 
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where F(m) is the Gamma function, and 7(m, x) is the lower incomplete Gamma function. It is easy to 
show that for an even number m, 



7KV/2) = e _ r/2 (r/2)* 
T(m/2) ^ i! 



Since by Taylor expansion e T l 2 = » we obtain 



7 W2,r/2)_ 1 _ e _ r/2 | ; 1 (r72r_ (AJ) 



r(m/2) ^ t! 



Combining (IA.2b and (1A.31 ). we have 



m/2— 1 i I \f 

Pr(err) < e"^ 2 V (A.4) 

ML ^ t 

t=0 



Note that for t < |, i € N, the function f(t) = (§) jt\ is strictly increasing. Therefore,from (IA.4b we 
get 



Pr(err) < e"^ 2 V 

ML ^ t\ 

t=0 

(a) _ r/2 m (r/2) m / 2 

< 6 T (m/2)! 

W _ r/2 m (r/2) m / 2 

< 6 2 (m/2e) m /2 

r / e (J3-l)/20\ ~ r 

= 2(3 y ) 

= yc(/3)"^, 

where in (a) we used y < § and in (6) we used the inequality ml > {m/e) m . It can be easily verified 
that c(/3) > 1 for (3 > 1 and c(/3) — ► y/e as (3 grows. 
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